European scientists have developed a new artificial intelligence model, trained on large-scale health records, which can predict
susceptibility to more than 1,000 diseases decades into the future. The generative AI system called Delphi-2M was built at the
European Molecular Biology Laboratory in Cambridge, using “similar architecture to large language models but with key innovations
to work with healthcare data”, said Tom Fitzgerald of EMBL. Delphi was trained on anonymised medical records from 400,000
participants in UK Biobank. The researchers then tested the model successfully on data from 1.9mn patients in the Danish National
Patient Registry. The predictions across more than 1,000 diseases generally matched the accuracy of existing tools that have a far
narrower focus, such as the QRisk score for heart conditions. Results were published in Nature on Wednesday. “Our model is a proof
of concept, showing that it’s possible for AI to learn many of our long-term health patterns and use this information to generate
meaningful predictions,” said Ewan Birney, EMBL’s interim executive director. “We were surprised at how well the model transferred
from the UK to Denmark though it had never seen a single bit of Danish data.” Developing Delphi into a forecasting tool suitable
for routine clinical use with individual patients could take five to 10 years, added Birney, but it will be available much sooner
to guide healthcare strategies.
Please use the sharing tools found via the share button at the top or side of articles. Copying articles to share with others is a
breach of FT.com [http://FT.com] T&Cs [https://help.ft.com/help/legal-privacy/terms-conditions/] and Copyright Policy
[https://help.ft.com/help/legal-privacy/copyright/copyright-policy/]. Email licensing@ft.com [licensing@ft.com] to buy additional
rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found here
[https://www.ft.com/tour].
“Although it makes predictions for each individual, it can be very useful at the population level to forecast collective
healthcare needs, how many people will suffer from particular diseases such heart attacks, cancers or diabetes and what sort of
treatment they need,” said Moritz Gerstung, head of AI at the German Cancer Research Center in Heidelberg, another member of the
Delphi team. The model gives the best predictions for conditions with consistent patterns of progression, including cardiovascular
disease, diabetes and blood poisoning. It works less well for diseases with unpredictable external causes and for very rare
congenital conditions.
Please use the sharing tools found via the share button at the top or side of articles. Copying articles to share with others is a
breach of FT.com [http://FT.com] T&Cs [https://help.ft.com/help/legal-privacy/terms-conditions/] and Copyright Policy
[https://help.ft.com/help/legal-privacy/copyright/copyright-policy/]. Email licensing@ft.com [licensing@ft.com] to buy additional
rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found here
[https://www.ft.com/tour].
The researchers are now working to extend Delphi by also incorporating biological data about individuals’ genes and proteins. But
Birney said they were “very pleasantly surprised” by how well it performed with healthcare information alone, giving predictions
as good as or better than other models that use genomics and proteomics. “I want to stress the power of the straightforward
medical record,” he added. The authors have patented some of the key ideas behind Delphi’s prediction of the risk and timing of
disease. “We are exploring whether there are commercialisation possibilities and how to do that with our respective institutions,”
said Birney. “This research looks to be a significant step towards scalable, interpretable, and — most importantly — ethically
responsible form of predictive modelling in medicine,” said Gustavo Sudre, professor of genomic neuroimaging and AI at King’s
College London, who was not involved in the project. “While the current version relies solely on anonymised clinical records, it
is encouraging to see that the model architecture has been designed to accommodate richer data types, such as biomarkers, imaging
and even genomics.”